seo

A Better Approach for Filtering Webspam in Google Analytics

Ali JalilPour June 27, 2024

0 0 2 minutes read

“Don’t throw the baby out with the bathwater” is a popular saying that’s been around since the 16th century, but is no less relevant today, especially when considered against the backdrop of webspam in Google Analytics. In fact, frustrations caused by the spam issue have led to the loss of genuine data.

Spam in analytics could be the single most irritating thing in online marketing. Numerous blog posts have been written on the topic.

One particular solution consistently surfaces as the fastest way to get rid of spam: Set up one or two filters in your analytics, and you’re free of spam forever. This strategy is based on including only valid hostnames to filter out ghost spammers, the most aggressive type of spammers.

Even though implementing this solution it is a seemingly valid option, it is also the most risky one, for you are likely to lose valuable data and insights in the process.

Table of Contents

Why is this seemingly valid option risky?

Using the two-filters option is risky because it uses inclusion instead of exclusion; and also because it marks an unset hostname as spam.

Inclusion versus exclusion

Inclusion: only allows data from known genuine sources
Exclusion: only filters data from known spam sources

What’s a hostname?

The hostname always tells you which domain your website was visited from.

This can be any (sub)domain you claimed, like www.mydomain.com, mydomain.com, blog.mydomain.com or mydomain.co.uk. However, the hostname could also be the domain of translation, cache, or shopping services like translate.googleusercontent.com or paypal.com.

This strategy is perfect to use in a vacuum. In real life, however, we have seen too many cases where using this strategy could have gone terribly wrong:

Over a span of months or years, you work with multiple people and agencies. They don’t always know what was previously set up.
The internet and your business will evolve and more genuine sources will appear. Who will make sure they are always included from day one?
Plus, a minor technical error in your code may cause your hostname to be “not set.” This would make your genuine data appear as spam. It wouldn’t pass the inclusion filter, and you’d never even know it.

Real-life data needs a real-life solution

With the inclusion strategy any of above real-life scenarios causes you to lose genuine data.

In fact, one of our clients would have deleted all of the brand’s conversion data if they’d used the two-filter solution, solely because of a third-party plug-in that was implemented by another agency.

The plug-in created a new session without the hostname data instead of the real session:

AAEAAQAAAAAAAAV-AAAAJGEyOGRlNDcxLTRlMDgt

What’s the best alternative?

Only filter spam when you’re 100 percent sure it’s spam. Working with exclusion has its downsides, of course:

You have to make sure your exclusion filters are always up-to-date with the latest spammers.
You will allow some spam to enter—for instance, visits with an unset hostname that actually are spammers.

Based on the data in our clients’ accounts, these spammers account for 0.4 percent, on average, of all traffic.

This means your analytics, on average, would retain 99.6 accuracy without risking losing genuine traffic.

Back to you

So, what’s your take on dealing with spam in analytics?

If you’re like us, you’d rather filter real spam while lessening the likelihood that real data is included.

Ali JalilPour June 27, 2024

0 0 2 minutes read

The end is near for SEO!

NetDisaster

The Top Skills to Look for When Hiring Your Next In-House SEO

Why I Still Use Meta Keywords

How Processing Fluency Impacts Web Marketing

Actionable Link Building Strategies

How SEOmoz Changed My Life

The Complete Guide to Becoming an Authentic Thought Leader

A Review of the International Search Summit London 2009

How to Build a Strong Brand Moat With SEO

Reputation, Rankings, and Revenue: Navigating Local for Non-Technical People

Revenge of the meta-tag!

A Better Approach for Filtering Webspam in Google Analytics

Why is this seemingly valid option risky?

Inclusion versus exclusion

Real-life data needs a real-life solution

What’s the best alternative?

Back to you

Ali JalilPour

Leave a Reply Cancel reply

Web hosting for SEO: Why it’s important

SEM career playbook: Overview of a growing industry

What Is SEO – Search Engine Optimization?

5 Quick Google Analytics Hacks

Understanding the Google Ads Auction & Why Ad Rank Is Important

MSN’s site: command is Confusing Me…

How I Develop Successful Link Building Strategies for My Clients

Optimizing for AI Overviews

My Top 5 Local SEO and Marketing Takeaways From MozCon 2024

SMX Advanced Recap (The R. Kelley Version)

Exalead’s Advanced Query Operators

What Is SEO – Search Engine Optimization?

Why is this seemingly valid option risky?

Inclusion versus exclusion

Real-life data needs a real-life solution

What’s the best alternative?

Back to you

Subscribe to our mailing list to get the new updates!

What Big Brands Need to Know About Google's Filter

10 Predictions for 2016 in SEO & Web Marketing

Related Articles

Leave a Reply Cancel reply

Web hosting for SEO: Why it’s important

SEM career playbook: Overview of a growing industry

What Is SEO – Search Engine Optimization?

5 Quick Google Analytics Hacks

Understanding the Google Ads Auction & Why Ad Rank Is Important

MSN’s site: command is Confusing Me…

How I Develop Successful Link Building Strategies for My Clients

Optimizing for AI Overviews

My Top 5 Local SEO and Marketing Takeaways From MozCon 2024

SMX Advanced Recap (The R. Kelley Version)

Exalead’s Advanced Query Operators

What Is SEO – Search Engine Optimization?